COMPARATIVE ANALYSIS OF METHODS OF VECTORIZATION OF HIGH DIMENSIONAL TEXT DATA
نویسندگان
چکیده
The presented publication is devoted to an overview of the problem presenting textual informationfor subsequent implementation cluster analysis in framework processingand managing high-dimensional information. Modern requirements for analytical, search andrecommendation information systems demonstrate weak formation a holistic solution thatcan provide sufficient level speed and quality results obtained within ofthe current technology market. problementails need conduct objective existing solutions representing informationin vector space, order form view advantages disadvantages analyzed approaches, as well criteria that allow one implement theirown approach, devoid identified weaknesses. work allows youto get idea state elaboration limited subjectarea. Clustering text data automatic subsets, elements which are instancesof documents some researched, unstructured sample fixed dimension. This processcan be classified unsupervised learning, implies absence expert who personallyassigns class indices original documents. However, clusteranalysis without any pre-processing impossible. To do this, it necessary ensurestandardization reduction input single format form. Within ofthis stage analysis, discusses methodsfor preprocessing data. novelty lies thetheoretical basis main methods vectorization, by systematizing objectifyingthe proposed assumptions, conducting series experimental studies. difference from already published scientific works systematization modernsolutions, hypotheses about relevance effectiveness our own hybridizedapproach designed vectorization.
منابع مشابه
comparative analysis of the use of hedges & emphatics in english and persian academic research articles of sociology & psychology
چکیده ندارد.
15 صفحه اولthe impact of musical texts on the text recall of young learners of english in isfahan junior high schools
abstract although music possesses some kind of power and using it has been welcome by many students in language classrooms, it seems that they take a non-serious image of the lesson while listening to songs and they may think that it is a matter of fun. the main objective of the present study was to investigate whether learning a foreign language through musical texts (songs) can have an impac...
15 صفحه اولa comparative pragmatic analysis of the speech act of “disagreement” across english and persian
the speech act of disagreement has been one of the speech acts that has received the least attention in the field of pragmatics. this study investigates the ways power relations, social distance, formality of the context, gender, and language proficiency (for efl learners) influence disagreement and politeness strategies. the participants of the study were 200 male and female native persian s...
15 صفحه اولa comparative analysis of the marginal microleakages of two pit and fissure sealans, conseal-f and conseal clear
چکیده ندارد.
15 صفحه اولa comparative move analysis of the introduction sections of ma theses by iranian and native post-graduate students
since esp received universal attention to smooth the path for academic studies and productions, a great deal of research and studies have been directed towards this area. swales’ (1990) model of ra introduction move analysis has served a pioneering role of guiding many relevant studies and has proven to be productive in terms of helpful guidelines that are the outcome of voluminous productions ...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Izvestiâ ÛFU
سال: 2023
ISSN: ['1999-9429', '2311-3103']
DOI: https://doi.org/10.18522/2311-3103-2023-2-212-226